255 research outputs found
Generating Natural Language from Linked Data:Unsupervised template extraction
We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.
A Semantic Web of Know-How: Linked Data for Community-Centric Tasks
This paper proposes a novel framework for representing community know-how on
the Semantic Web. Procedural knowledge generated by web communities typically
takes the form of natural language instructions or videos and is largely
unstructured. The absence of semantic structure impedes the deployment of many
useful applications, in particular the ability to discover and integrate
know-how automatically. We discuss the characteristics of community know-how
and argue that existing knowledge representation frameworks fail to represent
it adequately. We present a novel framework for representing the semantic
structure of community know-how and demonstrate the feasibility of our approach
by providing a concrete implementation which includes a method for
automatically acquiring procedural knowledge for real-world tasks.Comment: 6th International Workshop on Web Intelligence & Communities (WIC14),
Proceedings of the companion publication of the 23rd International Conference
on World Wide Web (WWW 2014
Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing
Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers
Exploring data-in-use: the value of data for Local Government
The power of data to support digital transformation within the context of e-Government is frequently underestimated. In this exploratory research, we develop a conceptual framework where the value of data stems from how it is used. We claim that the impact of digital transformation in the public sector presupposes an organisational culture that recognises and values data-in-use, by which is meant the practical application of data for a specific purpose, particularly by staff who deliver services. Through the lens of two ‘worldviews’ of data sharing, we present case studies of data use in two local authorities in Scotland. We claim that developing a culture where data is leveraged to derive insights for organisational activity requires combining working practices and technical infrastructure that centre on co-creating value with data. The presence of data intermediaries can support effective data-in-use to establish a healthy internal data ecosystem. Our research illustrates that local authorities within Scotland are still at an early stage of developing this culture.Die Bedeutung von Daten für die digitale Transformation im Kontext von eGovernment wird häufig unterschätzt. In diesem explorativ angelegten Artikel wird ein konzeptioneller Rahmen entwickelt, bei dem der Wert von Daten für eGovernment von deren Nutzung bestimmt wird. Argumentiert wird, dass die Verwirklichung der Potenziale der digitalen Transformation im öffentlichen Sektor eine Organisationskultur voraussetzt, die data-in-use versteht und deren Wert erkennt. Mit "data-in-use" ist die praktische Nutzung von Daten für einen spezifischen Zweck durch Verwaltungsmitarbeiter*innen gemeint. Empirisch basiert der Artikel auf zwei Fallstudien zur Datennutzung in schottischen Kommunalverwaltungen, die unterschiedliche Formen des Datenaustauschs repräsentieren. Die Analyse zeigt, dass ein Fokus auf Wertschöpfung (Value Co-Creation) durch Daten bei Arbeitsabläufen und technischer Infrastruktur erforderlich ist, um eine wirksame Datennutzungskultur zu entwickeln. Der Einsatz von Intermediären kann zu einer effektiven Datennutzung in einem internen Datenökosystem beitragen. Im Ergebnis wird gezeigt, dass sich Kommunalverwaltungen in Schottland noch am Anfang des Weges hin zu einer solchen Organisationskultur befinden
Computational semantics in the Natural Language Toolkit
NLTK, the Natural Language Toolkit, is an open source project whose goals include providing students with software and language resources that will help them to learn basic NLP. Until now, the program modules in NLTK have covered such topics as tagging, chunking, and parsing, but have not incorporated any aspect of semantic interpretation. This paper describes recent work on building a new semantics package for NLTK. This currently allows semantic representations to be built compositionally as a part of sentence parsing, and for the representations to be evaluated by a model checker. We present the main components of this work, and consider comparisons between the Python implementation and the Prolog approach developed by Blackburn and Bos (2005).
- …